-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lazy descriptor pool allocation #2285
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2285
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit 1c16288 with merge base 0570294 (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D54603935 |
Summary: ## Context In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every `ComputeGraph` has its own dedicated Context. pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size. ## Implementation Overview 1. When constructing `ComputeGraph`, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's `api::Context` instance 2. When building the graph, `ExecuteNode` and `PrepackNode` will call `graph.update_descriptor_counts(shader)` upon construction, which allows `ComputeGraph` to count the total number of descriptor sets needed. 3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers. 4. Before encoding any command buffers, call `graph.prepare()` which will construct a descriptor pool config from the descriptor counts. ## Notes One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android. A more robust design, i.e. as discussed [here](https://www.reddit.com/r/vulkan/comments/17v66fi/question_about_descriptor_pool_allocations/) may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time. Reviewed By: jorgep31415 Differential Revision: D54603935
Summary: ## Context In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every `ComputeGraph` has its own dedicated Context. pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size. ## Implementation Overview 1. When constructing `ComputeGraph`, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's `api::Context` instance 2. When building the graph, `ExecuteNode` and `PrepackNode` will call `graph.update_descriptor_counts(shader)` upon construction, which allows `ComputeGraph` to count the total number of descriptor sets needed. 3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers. 4. Before encoding any command buffers, call `graph.prepare()` which will construct a descriptor pool config from the descriptor counts. ## Notes One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android. A more robust design, i.e. as discussed [here](https://www.reddit.com/r/vulkan/comments/17v66fi/question_about_descriptor_pool_allocations/) may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time. Reviewed By: jorgep31415 Differential Revision: D54603935
This pull request was exported from Phabricator. Differential Revision: D54603935 |
This pull request was exported from Phabricator. Differential Revision: D54603935 |
This pull request has been merged in aed32c4. |
## Context While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue. A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now. ## Problem Details #2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers. In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of ``` descriptorPoolMaxSets: 1255 descriptorUniformBufferCount: 5013 descriptorStorageBufferCount: 4 descriptorCombinedSamplerCount: 2504 descriptorStorageImageCount: 1254 ``` Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver. ## Solution Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution. Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/) [ghstack-poisoned]
## Context While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue. A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now. ## Problem Details #2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers. In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of ``` descriptorPoolMaxSets: 1255 descriptorUniformBufferCount: 5013 descriptorStorageBufferCount: 4 descriptorCombinedSamplerCount: 2504 descriptorStorageImageCount: 1254 ``` Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver. ## Solution Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution. Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/) ghstack-source-id: 218502788 Pull Request resolved: #2398
…emory" ## Context While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue. A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now. ## Problem Details #2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers. In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of ``` descriptorPoolMaxSets: 1255 descriptorUniformBufferCount: 5013 descriptorStorageBufferCount: 4 descriptorCombinedSamplerCount: 2504 descriptorStorageImageCount: 1254 ``` Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver. ## Solution Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution. Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/) [ghstack-poisoned]
Pull Request resolved: #2398 ## Context While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue. A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now. ## Problem Details #2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers. In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of ``` descriptorPoolMaxSets: 1255 descriptorUniformBufferCount: 5013 descriptorStorageBufferCount: 4 descriptorCombinedSamplerCount: 2504 descriptorStorageImageCount: 1254 ``` Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver. ## Solution Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution. Differential Revision: [D54853788](https://our.internmc.facebook.com/intern/diff/D54853788/) ghstack-source-id: 218509680
Summary: Pull Request resolved: #2398 ## Context While testing a toy model with a large number of operators, I ran into an issue on my local Pixel 6 Android device where the descriptor pool was running out of memory. This changeset implements a simple fix to ensure that descriptor pools do not run into this issue. A longer term solution is to implement layout specific descriptor pools, but that is much more technically complex so go with this for now. ## Problem Details #2285 made it so that `ComputeGraph` could tally up the total number of descriptors needed and size the descriptor pools appropriately, but it seems that this is not compatible with certain Vulkan drivers. In the toy model, 1000 binary operators were added. Counting the descriptors required for the graph provides descriptor counts of ``` descriptorPoolMaxSets: 1255 descriptorUniformBufferCount: 5013 descriptorStorageBufferCount: 4 descriptorCombinedSamplerCount: 2504 descriptorStorageImageCount: 1254 ``` Which appears to be correct, however it appears that the descriptor pool runs out of memory due to an insufficient number of `descriptorStorageBufferCount`. The `descriptorStorageBufferCount` needs to be set at a surprisingly high number (approx ~1000) before the descriptor pool does not run out of memory. I'm not sure exactly what causes this behaviour, but it could be due to the implementation details of the driver. ## Solution Ensure that all descriptor counts are at greater than or equal to the maximum number of descriptor sets seems to work. Implement this as a temporary solution. ghstack-source-id: 218509680 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: jorgep31415 Differential Revision: D54853788 fbshipit-source-id: 391e3d10a678672df9af96e3b6a8484453b039f1
Summary:
Context
In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every
ComputeGraph
has its own dedicated Context.pytorch/pytorch#121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size.
Implementation Overview
ComputeGraph
, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph'sapi::Context
instanceExecuteNode
andPrepackNode
will callgraph.update_descriptor_counts(shader)
upon construction, which allowsComputeGraph
to count the total number of descriptor sets needed.graph.prepare()
which will construct a descriptor pool config from the descriptor counts.Notes
One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android.
A more robust design, i.e. as discussed here may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time.
Differential Revision: D54603935